B200 Minimax FP8 vllm upgrade by kedarpotdar-nv · Pull Request #947 · SemiAnalysisAI/InferenceX

kedarpotdar-nv · 2026-03-26T01:12:31Z

Summary

Upgrade MiniMax-M2.5 FP8 B200 vLLM benchmark configuration from v0.17.0 to v0.19.0 with expanded search space and tuned serving parameters.

Changes

Image Upgrade

Update vLLM image from v0.17.0-cu130 to v0.19.0-cu130

Search Space Updates (`nvidia-master.yaml`)

Expand concurrency ranges for existing tp:2 and tp:4 entries (up to 512 for 1k1k, up to 256 for 8k1k)
Add tp:2 ep:2 and tp:4 ep:4 search-space entries for 1k1k seq-len config
Remove ISL 1024 / OSL 8192 seq-len config

Benchmark Script Updates (`minimaxm2.5_fp8_b200.sh`)

Remove VLLM_USE_FLASHINFER_MOE_FP8=0 and VLLM_MOE_USE_DEEP_GEMM=0 env vars
Add VLLM_FLASHINFER_ALLREDUCE_BACKEND=mnnvl
Fix EP flag condition (-gt 1 instead of -ge 1)
Add --kv-cache-dtype fp8
Add --max-cudagraph-capture-size 2048
Add --max-num-batched-tokens based on ISL
Add --stream-interval 20
Reduce --gpu-memory-utilization from 0.95 to 0.90

Validated Run

https://github.com/SemiAnalysisAI/InferenceX/actions/runs/23694341288

github-actions · 2026-03-26T01:12:43Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

functionstackx · 2026-03-28T21:04:13Z

@claude solve the merge conflicts

Klaud-Cold · 2026-03-28T21:04:35Z

Claude finished @functionstackx's task in 2m 0s —— View job

Merge Conflicts Resolved

Identify merge conflicts with origin/main
Resolve conflicts in 3 files:
- .github/configs/nvidia-master.yaml — kept PR's expanded search space (tp:2/tp:4 with wider conc ranges, tp:4 ep:4, tp:8, and 1k/8k seq-len config)
- benchmarks/single_node/minimaxm2.5_fp8_b200.sh — kept PR's --kv-cache-dtype fp8 and --stream-interval 20 alongside main's --no-enable-prefix-caching
- perf-changelog.yaml — kept all new entries from main (PRs Disable prefix cache for kimi vllm configs #926, Disable prefix minimax #966, Separate eval-only workflow and change to 8k1k #911) and the PR's entry (B200 Minimax FP8 vllm upgrade #947)
Push resolved changes

Pushed commit f12f56d to nv/minimax-vllm018.
|

Resolve merge conflicts in 3 files: - .github/configs/nvidia-master.yaml: keep expanded search space from PR - benchmarks/single_node/minimaxm2.5_fp8_b200.sh: keep --kv-cache-dtype fp8 and --stream-interval 20 with --no-enable-prefix-caching from main - perf-changelog.yaml: keep both main's new entries and PR's entry Co-authored-by: functionstackx <functionstackx@users.noreply.github.com>

functionstackx

hi @faradawn @kedarpotdar-nv can u update & merge ur existing recipe PR first before we merge this inferencex PR? vllm-project/recipes#272

usually we allow the inferencex PR to merge first as the recipe PR usually merges as an fast follow, but the reason why the recipes PR needs to merge first is cuz the previous PR's recipe update PR still didnt merge..

functionstackx · 2026-03-29T02:24:41Z

@claude remove isl 1024 osl: 8192

Klaud-Cold · 2026-03-29T02:25:00Z

Claude finished @functionstackx's task in 57s —— View job

Remove ISL 1024 OSL 8192 Configuration

Identify where ISL 1024 / OSL 8192 is configured
Remove the seq-len config block from nvidia-master.yaml (lines 3101-3105)
Update perf-changelog.yaml with removal note
Commit and push changes

Removed the ISL 1024 / OSL 8192 seq-len config (with tp:2 and tp:4 search-space entries) from minimaxm2.5-fp8-b200-vllm in nvidia-master.yaml. The 1k1k and 8k1k configs remain.

Pushed commit 83706ae to nv/minimax-vllm018.

functionstackx · 2026-03-29T02:25:40Z

perf didnt change much

Co-authored-by: functionstackx <functionstackx@users.noreply.github.com>

functionstackx · 2026-03-29T02:30:20Z

validated run here https://github.com/SemiAnalysisAI/InferenceX/actions/runs/23694341288

functionstackx · 2026-03-30T06:57:04Z

@kedarpotdar-nv can we not use an nightly image here? Minimax M2 arch been around for multiple months now. And from looking at the master.yaml files, minimax is all using release image

kedarpotdar-nv · 2026-04-01T17:34:16Z

@Klaud-Cold please remove export VLLM_USE_DEEP_GEMM=0 from minimax b200 fp8 vllm script.

functionstackx · 2026-04-01T19:16:14Z

@kedarpotdar-nv it is @ claude

kedarpotdar-nv · 2026-04-03T16:10:32Z

@kedarpotdar-nv can we not use an nightly image here? Minimax M2 arch been around for multiple months now. And from looking at the master.yaml files, minimax is all using release image

@functionstackx this PR is ready for review. we are using release 0.19 continaer now

cquil11 · 2026-04-03T16:24:11Z

@claude add pr description in accordance with perf changelog

cquil11 · 2026-04-03T16:24:21Z

.github/configs/nvidia-master.yaml

+    - { tp: 2, ep: 2, conc-start: 4, conc-end: 256 }
+    - { tp: 4, conc-start: 4, conc-end: 512 }
+    - { tp: 4, ep: 4, conc-start: 16, conc-end: 64 }
+  # - { tp: 8, conc-start: 4, conc-end: 8 }


remove this comment @claude

Claude finished @cquil11's task in 40s —— View job

Remove Comment

Remove commented-out line at nvidia-master.yaml:3119 (# - { tp: 8, conc-start: 4, conc-end: 8 })

Commit and push

Removed the commented-out tp:8 search-space entry. Pushed commit 4f15004 to nv/minimax-vllm018.

Klaud-Cold · 2026-04-03T16:24:33Z

Claude finished @cquil11's task in 39s —— View job

Add PR Description

Read perf-changelog.yaml for this PR's entries
Read changed files for context
Update PR description

PR description has been updated with a summary of all changes from the perf-changelog and diffs: image upgrade (v0.17.0 → v0.19.0), search space expansions, new EP entries, removed seq-len config, and all benchmark script parameter changes.

cquil11 · 2026-04-03T16:24:45Z

benchmarks/single_node/minimaxm2.5_fp8_b200.sh

+export VLLM_FLASHINFER_ALLREDUCE_BACKEND=mnnvl

-if [ "$EP_SIZE" -ge 1 ]; then
+if [ "$EP_SIZE" -gt 1 ]; then


good catch lol

cquil11 · 2026-04-03T16:25:51Z

perf-changelog.yaml

+- config-keys:
+    - minimaxm2.5-fp8-b200-vllm
+  description:
+    - "Update vLLM image from v0.17.0 to v0.18.0 for MiniMax-M2.5 FP8 B200"


update to 0.19.0 @claude
also change pr desc to reflect this

Claude finished @cquil11's task in 52s —— View job

Update perf-changelog to 0.19.0 & PR Description

Read perf-changelog.yaml to find the entry to update

Update version from v0.18.0 → v0.19.0 in perf-changelog.yaml

Update PR description to reflect v0.19.0

Commit and push

Updated the perf-changelog entry (v0.18.0 → v0.19.0) and the PR description. Pushed commit 0d734c1 to nv/minimax-vllm018.

Co-authored-by: Cameron Quilici <cquil11@users.noreply.github.com>

cquil11 · 2026-04-03T16:30:32Z

passing run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/23931228656

fixed

cquil11

lgtm

cquil11

lgtm

Update nvidia-master.yaml

dc27fa1

github-project-automation bot added this to InferenceMAX Board Mar 26, 2026

kedarpotdar-nv added the NVIDIA label Mar 26, 2026

vllm version bump

9ee612f

kedarpotdar-nv requested a review from jgangani as a code owner March 26, 2026 01:19

kedarpotdar-nv requested a review from a team March 26, 2026 01:19

add perf changelog

a27edee

kedarpotdar-nv added the sweep-enabled label Mar 26, 2026

kedarpotdar-nv and others added 3 commits March 27, 2026 12:19

update search space and configs

a33dc21

Merge branch 'main' into nv/minimax-vllm018

ecaac19

fix typo in VLLM_USE_DEEP_GEMM

89acdf2

functionstackx previously requested changes Mar 29, 2026

View reviewed changes

functionstackx removed the sweep-enabled label Mar 29, 2026

Remove ISL 1024 / OSL 8192 seq-len config for minimaxm2.5-fp8-b200-vllm

83706ae

Co-authored-by: functionstackx <functionstackx@users.noreply.github.com>

kedarpotdar-nv and others added 2 commits March 29, 2026 23:03

update image

e405b60

Merge branch 'main' into nv/minimax-vllm018

e4333e9

kedarpotdar-nv added the sweep-enabled label Mar 30, 2026

kedarpotdar-nv and others added 2 commits April 1, 2026 10:38

update config and remove DEEPGEMM flag

534927b

Merge branch 'main' into nv/minimax-vllm018

b885b69

kedarpotdar-nv and others added 8 commits April 1, 2026 16:11

test tep

6073a5e

Merge branch 'main' into nv/minimax-vllm018

4eb4c92

fix typo in ep bash script

f6d81f7

add max cudagraph size

d172646

upgrade to vllm 0.19

97b25f1

typo

beb6a6b

revert h200 change

9af9992

Merge branch 'main' into nv/minimax-vllm018

f1cc537

kedarpotdar-nv changed the title ~~[WIP] B200 Minimax FP8 vllm upgrade~~ B200 Minimax FP8 vllm upgrade Apr 3, 2026

cquil11 reviewed Apr 3, 2026

View reviewed changes

cquil11 removed the sweep-enabled label Apr 3, 2026

github-actions bot and others added 2 commits April 3, 2026 16:27

fix: update perf-changelog version to v0.19.0

0d734c1

Co-authored-by: Cameron Quilici <cquil11@users.noreply.github.com>

Remove commented-out tp:8 search-space entry

4f15004

Co-authored-by: Cameron Quilici <cquil11@users.noreply.github.com>

cquil11 requested changes Apr 3, 2026

View reviewed changes

cquil11 approved these changes Apr 3, 2026

View reviewed changes

cquil11 merged commit bddbf40 into main Apr 3, 2026
16 checks passed

cquil11 deleted the nv/minimax-vllm018 branch April 3, 2026 16:31

github-project-automation bot moved this to Done in InferenceMAX Board Apr 3, 2026

Conversation

kedarpotdar-nv commented Mar 26, 2026 • edited by Klaud-Cold Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Image Upgrade

Search Space Updates (nvidia-master.yaml)

Benchmark Script Updates (minimaxm2.5_fp8_b200.sh)

Validated Run

Uh oh!

github-actions bot commented Mar 26, 2026

Uh oh!

functionstackx commented Mar 28, 2026

Uh oh!

Klaud-Cold commented Mar 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merge Conflicts Resolved

Uh oh!

functionstackx left a comment

Choose a reason for hiding this comment

Uh oh!

functionstackx commented Mar 29, 2026

Uh oh!

Klaud-Cold commented Mar 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Remove ISL 1024 OSL 8192 Configuration

Uh oh!

functionstackx commented Mar 29, 2026

Uh oh!

functionstackx commented Mar 29, 2026

Uh oh!

functionstackx commented Mar 30, 2026

Uh oh!

kedarpotdar-nv commented Apr 1, 2026

Uh oh!

functionstackx commented Apr 1, 2026

Uh oh!

kedarpotdar-nv commented Apr 3, 2026

Uh oh!

cquil11 commented Apr 3, 2026

Uh oh!

cquil11 Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

cquil11 Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

Klaud-Cold Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Remove Comment

Uh oh!

Klaud-Cold commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Add PR Description

Uh oh!

cquil11 Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

cquil11 Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

Klaud-Cold Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Update perf-changelog to 0.19.0 & PR Description

Uh oh!

cquil11 commented Apr 3, 2026

Uh oh!

cquil11 left a comment

Choose a reason for hiding this comment

Uh oh!

cquil11 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

kedarpotdar-nv commented Mar 26, 2026 •

edited by Klaud-Cold

Loading

Search Space Updates (`nvidia-master.yaml`)

Benchmark Script Updates (`minimaxm2.5_fp8_b200.sh`)

Klaud-Cold commented Mar 28, 2026 •

edited

Loading

Klaud-Cold commented Mar 29, 2026 •

edited

Loading

Klaud-Cold Apr 3, 2026 •

edited

Loading

Klaud-Cold commented Apr 3, 2026 •

edited

Loading

Klaud-Cold Apr 3, 2026 •

edited

Loading